AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal intelligent agent

# Multimodal intelligent agent

UI TARS 72B DPO
Apache-2.0
UI-TARS is the next-generation native GUI intelligent agent model, which has human-like perception, reasoning, and action capabilities, and can seamlessly interact with the graphical user interface (GUI).
Multimodal Fusion Transformers Supports Multiple Languages
U
parasail-ai
179
0
Videomind 2B FT QVHighlights
Bsd-3-clause
VideoMind is a multimodal intelligent agent framework that enhances video reasoning ability by simulating human-like cognitive processes.
Video-to-Text Safetensors
V
yeliudev
20
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase